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Citation distributions are crucial for the analysis and modeling of the activity of scientists. We 
investigated bibliometric data of papers published in journals of the American Physical Society, 
searching for the type of function which best describes the observed citation distributions. We 
used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log- 
normal, simple power law and shifted power law. The shifted power law turns out to be the 
most reliable hypothesis for all citation networks we derived, which correspond to different time 
spans. We find that citation dynamics is characterized by bursts, usually occurring within a few 
years since publication of a paper, and the burst size spans several orders of magnitude. We also 
investigated the microscopic mechanisms for the evolution of citation networks, by proposing a 
linear preferential attachment with time dependent initial attractiveness. The model successfully 
reproduces the empirical citation distributions and accounts for the presence of citation bursts as 
well. 



I. INTRODUCTION 

Citation networks are compact representations of the 
relationships between research products, both in the sci- 
ences and the humanities [H, • As such they are a valu- 
able tool to uncover the dynamics of scientific produc- 
tivity and have been studied for a long time, since the 
seminal paper by De Solla Price [sj. In the last years, in 
particular, due to the increasing availability of large bibli- 
ographic data and computational resources, it is possible 
to build large networks and analyze them to an unprece- 
dented level of accuracy. 

In a citation network, each vertex represents a paper 
and there is a directed edge from paper A to paper B 
if A includes B in its list of references. Citation net- 
works are then directed, by construction, and acyclic, as 
papers can only point to older papers, so directed loops 
cannot be obtained. A large part of the literature on 
citation networks has focused on the characterization of 
the probability distribution of the number of citations 
received by a paper, and on the design of simple micro- 
scopic models able to reproduce the distribution. The 
number of citations of a paper is the number of incom- 
ing edges (indegree) k™ of the vertex representing the 
paper in the citation network. So the probability dis- 
tribution of citations is just the indegree distribution 
P (fc'") . There is no doubt that citation distributions 
are broad, as there are papers with many citations to- 
gether with many poorly cited (including many uncited) 
papers. However, as of today, the functional shape of 
citation distributions is still elusive. This is because the 
question is ill-defined. In fact, one may formulate it in a 
variety of different contexts, which generally yield differ- 
ent answers. For instance, one may wish to uncover the 
distribution from the global citation network including 
all papers published in all journals at all times. Oth- 
erwise, one may wish to specialize the query to specific 
disciplines or years. The role of the discipline considered 
is important and is liable to affect the final result. For 



instance, it is well known that papers in Biology are, on 
average, much more cited than papers in Mathematics. 
One may argue that this evidence may still be consis- 
tent with having similar functional distributions for the 
two disciplines, defined on ranges of different sizes. Also, 
the role of time is important. It is unlikely that citation 
distributions maintain the exact same shape regardless 
of the specific time window considered. The dynamics of 
scientific production has changed considerably in the last 
years. It is well known, for instance, that the number of 
published papers per year has been increasing exponen- 
tially until now Q. This, together with the much quicker 
publication times of modern journals, has deeply affected 
the dynamics of citation accumulation of papers. More- 
over, if the dataset at study includes papers published in 
different years, older papers tend to have more citations 
than recent ones just because they have been exposed 
for a longer time, not necessarily because they are better 
works: the age of a paper is an important factor. 

So, the question of which function best describes the ci- 
tation distributions is meaningless if one does not define 
precisely the set of publications examined. Redner Q 
considered all papers published in Physical Review D up 
to 1997, along with all articles indexed by Thomson Sci- 
entific in the period 1981-1997, and found that the right 
tail of the distribution, corresponding to highly cited pa- 
pers, follows a power law with exponent 7 = 3, in accord 
with the conclusions of Price Laherrere and Sor- 
nette @ studied the top 1120 most cited physicists dur- 
ing the period 1981-1997, whose citation distribution is 
more compatible with a stretched exponential P (fc™) ~ 

exp - (fc™)'^ , with P ~ 0.3. Tsallis and de Albu- 
querque analyzed the same datasets used by Redner 
with an additional one including all papers published up 
to 1999 in Physical Review E, and found that the Tsallis 

distribution P (fc*") = P(0)/ [l (/? - 1) \ k'''Y'^'^'^\ 
with A ~ 0.1 and /3 ~ 1.5, consistently fits the whole dis- 
tribution of citations (not just the tail). More recently 
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Redner performed an analysis over all papers published 
in the 110 years long history of journals of the Ameri- 
can Physical Society (APS) [S], concluding that the log- 
normal distribution 

P(.») = ^-l==exp{-[ln(fc™)-,]V(2a^)} 

(1) 

is more adequate than a power law. In other studies dis- 
tributions of citations have been fitted with various func- 
tional forms: power- lawji-[l4|, log-normal [H, El |ll > 
Tsallis distribution [itIIisII^ modified Bessel function [ly, 
[20j or more complicated distributions [2l| . 

In this paper we want to examine citation networks 
more in depth. We considered networks including all 
papers and their mutual citations within several time 
windows. We have performed a detailed analysis of the 
shape of the distributions, by computing the goodness of 
fits with Kolmogorov-Smirnov statistics of three model 
functions: simple power law, shifted power law and log- 
normal. Moreover, we have also examined dynamic as- 
pects of the process of citation accumulation, revealing 
the existence of "bursts" , i.e. of rapid accretions of the 
number of citations received by papers. Citation bursts 
are not compatible with standard models of citation accu- 
mulation based on preferential attachment (2^ , in which 
the accumulation is smooth and papers may attract many 
cites long after publication. Therefore, we propose a 
model in which the citation attractiveness of a paper 
depends both on the number of cites already collected 
by the paper and on some intrinsic attractiveness that 
decays in time. The resulting picture delivers both the 
citation distribution and the presence of bursts. 



II. RESULTS 
A. The distribution of cites 

For our analysis we use the citation database of the 
American Physical Society (APS), described in Materi- 
als and Methods. We get the best fit for the empirical 
citation distributions from the goodness of fit test with 
Kolmogorov-Smirnov (KS) statistics 23]. The KS statis- 
tic D is the maximum distance between the cumulative 
distribution function (CDF) of the empirical data and 
the CDF of the fitted model: 

D= max \S{k,n) - P{hn)\ (2) 

Here S{kin) is the CDF of the empirical indegree 
and P{kin) is the CDF of the model that fits best the 
empirical data in the region kin > ^1!^™- By search- 
ing the parameter space, the best hypothetical model 
is the one with the least value of D from the empirical 
data. To test the statistical significance of the hypothet- 
ical model, we cannot use the values of the KS statistics 
directly though, as the model has been derived from a 



best fit on the empirical data, rather than being an inde- 
pendent hypothesis. So, following Ref. [1^ we generate 
synthetic datasets from the model corresponding to the 
best fit curve. For instance, if the best fit is the power 
law ax^^, the datasets are generated from this distri- 
bution. Each synthetic dataset will give a value Dsynth 
for the KS statistics between the dataset and the best fit 
curve. These D^ynt/t-values are compared with Demp, i-e. 
the U-value between the original empirical data and the 
best fit curve, in order to define a p-value. The p-value 
is the fraction of Dsynt/i-values larger than Demp- If P is 
large (close to 1), the model is a plausible fit to the em- 
pirical data; if it is close to 0, the hypothetical model is 
not a plausible fit. We applied this goodness of fit test to 
three hypothetical model distributions: log-normal, sim- 
ple power law and shifted power law. The log-normal 
distribution for the indegree ki„ is given by 

F(fc„0 ^ L^ exp{-[log(fc,„) - /i]V(2^')}, (3) 

the simple power law distribution by 

p(fc™) ^ fc„r^ (4) 

and the shifted power law by 

P{k,n) ~ {hn + fco)"^- (5) 

We used 1000 synthetic distributions to calculate the -p- 
value for each empirical distribution. 

Figs, la, lb, Ic and Id show some fits for datasets cor- 
responding to several time windows (see Materials and 
Methods). The detailed summary of the goodness of fit 
results is shown in Table 1. The simple power law gives 
high p- value only when one considers the right tail of the 
distribution (usually fci„ > 20). The log-normal distri- 
bution gives high p-value for early years (before 1970) 
but after 1970 the p- value is smaller than 0.2. As shown 
in Figs, la and lb, there is a clear discrepancy in the 
tail between the best fit log-normal distribution and the 
empirical distribution. The shifted power law distribu- 
tion gives significant p-values (higher than 0.2) for all 
observation periods. The values of the exponent 7 of the 
shifted power law are decreasing in time. The range of 7 
goes from 5.6 (1950) to 3.1 (2008). 

We conclude that the shifted power law is the best 
distribution to fit the data. 



B. The distribution of citation bursts 

We now turn our attention to citation "bursts" . While 
there has been a sizeable activitjMn the analysis of bursty 
behavior in human dynamics |24l426l |. we are not aware 
of similar investigations for citation dynamics. We com- 
pute the relative rate Afc/fc = [fc(i+5i)L-fc(*)L]/^(0m]' 
where k{fy\^ is the number of citations of paper i at time 
t. The distributions of Afc/fc with t = 1949, 1969, 1989, 
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FIG. 1: Empirical citation distributions and best fit model distributions obtained through the goodness of fit with Komolgorov- 
Smirnov statistics. PL: Power law. SPL: Shifted power law. LN: Log-normal 



TABLE I: Summary of the results of the goodness of fit test with Kolmogorov-Smirnov statistic on the empirical citation 
distributions for three test functions, log-normal (LN), simple power law (PL) and shifted power law (SPL). 



distribution 


1950 


1955 


1960 


1965 


1970 


1975 


1980 


1985 


1990 


1995 


2000 


2005 


2008 


LN 




























p- value 


0.717 


0.734 


0.892 


0.998 


0.201 


0.105 


0.19 


0.119 


0.194 


0.194 


0.096 


0.05 


0.064 


^mirt 


2 


3 


7 


14 


2 


2 


2 


3 


2 


2 


2 


2 


2 


PL 




























p- value 


0.001 


0.955 


0.056 


0.321 


0.022 


0.127 


0.204 


0.784 


0.686 


0.412 


0.362 


0.619 


0.44 




6 


16 


9 


19 


12 


17 


20 


39 


46 


39 


43 


47 


47 


SPL 




























p- value 


0.832 


0.777 


0.49 


1.00 


0.943 


0.958 


0.49 


0.728 


0.909 


1.00 


0.797 


0.989 


0.99 




2 


2 


2 


14 


9 


12 


2 


2 


2 


2 


3 


6 


5 



2007 and St — 1 year are shown in Fig. 2a. They are visi- 
bly broad, spanning several orders of magnitude. Similar 
heavy tails of burst size distributions were observed in the 
dynamics of popularity in Wikipedia and the Web [27| . 
It is notable that the largest bursts take place in the first 
years after publication of a paper. This is manifest in 
Fig. 2b, where we show distributions derived from the 
same dataset as in Fig. 2a, but including only papers 
older than 5 (squares) and 10 years (triangles): the tail 
disappears. In general, more than 90% of large bursts 
(Afc/fc > 3.0) occur within the first 4 years since publi- 
cation. 



C. Preferential attachment and age-dependent 
attractiveness 



For many growing networks, cumulative advantage j28l . 
[29} . or preferential attachment [l^, has proven to be a 
reliable mechanism to explain the fat-tailed distributions 
observed. In the context of citation dynamics, it is rea- 
sonable to assume that, if a paper is very cited, it will 
have an enhanced chance to receive citations in the fu- 
ture with respect to poorly cited papers. This can be 
formulated by stating that the probability that a paper 
gets cited is proportional to the number of citations it 
already received. That was the original idea of Price [l^ 
and led to the development of the first dynamic mecha- 
nism for the generation of power law distributions in cita- 
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FIG. 2: (a) The four curves correspond to 1949, 1969, 1989 and 2007, the observation window is 51=1 year, (b) Here the 
reference year is 2007, but the burst statistics is limited to the papers pubhshed until 2003 (squares) and 1998 (triangles). For 
comparison, the full curve comprising all papers (circles, as in (a)) is also shown. 



tion networks. In later refinements of the model, one has 
introduced an attractiveness for the vertices, indicating 
their own appeal to attract edges, regardless of degree. In 
particular, one has introduced the so-called linear pref- 
erential attachment [Sll, [13] , in which the probability for 
a vertex to receive a new edge is proportional to the sum 
of the attractiveness of the vertex and its degree. In this 
Section we want to check whether this hypothesis holds 
for our datasets. This issue has been addressed in other 
works on citation analysis, like Refs. Jjl, iSSs ]. 

We investigated the dependence of the kernel function 
n(fcm) on indegree kin [M, [11]. The kernel is the rate 
with which a vertex i with indegree A:*„ acquires new 
incoming edges. For linear preferential attachment the 
kernel is 

n(fci„) = . (6) 

In Eq. |6] the constant Ai indicates the attractiveness of 
vertex i. Computing the kernel directly for each inde- 
gree class (i.e. for all vertices with equal indegree kin) 
is not ideal, as the result may heavily fluctuate for large 
values of the indegree, due to poor statistics. So, fol- 
lowing Refs. [13, [33, we consider the cumulative kernel 
n>(fci„) = J2k'<k„^ n(fc'), which, for the ansatz of Eq. [51 
should have the following functional dependence on kin 

n>(fc„)^fc2„ + (A)fc,„. (7) 

In Eq. [7] (A) is the average attractiveness of the vertices. 
In order to estimate n>(fci„), we need to compute the 
probability that vertices with equal indegree have gotten 
edges over a given time window, and sum the results over 
all indegree values from the smallest one to a given kin- 
The time window has to be small enough in order to pre- 
serve the structure of the network but not too small in or- 
der to have enough citation statistics. In Fig. 3 we show 
the cumulative kernel function n>(fc.m) as a function of 



indegree for a time window from 2007 to 2008. The pro- 
file of the curve (empty circles) is compatible with linear 
preferential attachment with an average attractiveness 
(A) = 7.0 over a large range, although the final part of 
the tail is missed. Still, the slope of the tail, apart from 
the final plateau, is close to 2, like in Eq. [71 Our result is 
consistent with that of Jeong et al. [13] , who considered a 
citation network of papers published in Physical Review 
Letters in 1988, which are part of our dataset as well. 
We have repeated this analysis for several datasets, from 
1950 until 2008, by keeping a time window of one year 
in each case. The resulting values of (A) are reported 
in Table 2, along with the number of vertices and mean 
degree of the networks. The average value of the attrac- 
tiveness across all datasets is 7.1. This value is much 
bigger than the average indegree in the early ages of the 
network like, for example, from 1950 to 1960. Hence, in 
the tradeoff between indegree and attractiveness of Eq. [51 
the latter is quite important for old papers. In general, 
for low indegrees, attractiveness dominates over prefer- 
ential attachment. As we see in Fig. 3, in fact, for low 
indegrees there is no power law dependence of the kernel 
on indegree. 



Finally we investigated the time dependence of the ker- 
nel. As shown in Fig. 3, when we limit the analysis to pa- 
pers older than 5 years (squares) or 10 years (triangles), 
the kernel has a pure quadratic dependence on indegree 
in the initial part, without linear terms, so the attractive- 
ness does not affect the citation dynamics. This means 
that the attractiveness has a significant influence on the 
evolution of the citation network only within the flrst 
few years after publication of the papers. The presence 
of vertex attractiveness had been considered by Jeong et 
al. as well [13] ■ 
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FIG. 3: Cumulative kernel function of citation network from 2007 to 2008. The continuous line is Ckint(kint + A) with A=7.0 
and C is constant. The dashed line corresponds to the case without attractiveness [ A = 0.0 ). 



TABLE II: Statistics of the empirical citation networks: N is the number of vertices in the network; < > is the average 
indegree of the network; < ^4 > is the average attractiveness, determined from the tests of linear preferential attachment 
discussed in the text. 





1950 


1955 


1960 


1965 


1970 


1975 


1980 


1985 


1990 


1995 


2000 


2005 


2008 


iV 


15880 


23350 


30996 


42074 


62382 


85590 


108794 


138206 


180708 


238142 


305570 


386569 


441595 


< fc > 


2.2 


3.1 


3.7 


4.3 


5.1 


5.6 


6.0 


6.2 


6.5 


7.0 


7.7 


8.5 


9.0 


<A> 


4.2 


5.3 


6.2 


5.4 


7.2 


7.9 


7.8 


9.0 


7.4 


7.3 


6.8 


6.4 


7.0 



D. The model 



We would like to design a microscopic model that re- 
flects the observed properties of our citation networks. 
Preferential attachment does not account for the fact 
that the probability to receive citations may depend on 
time. In the Price model, for instance, papers keep col- 
lecting citations independently of their age, while it is 
empirically observed 0, [s^ [13] that the probability for 
an article to get cited decreases as the age of the same 
article increases. In addition, we have seen that cita- 
tion bursts typically occur in the early life of a paper. 
Some sophisticated growing network models include the 
aging of vertices as well |33l. [37l - l40| . We propose a mech- 
anism based on linear preferential attachment, where pa- 
pers have individual values of the attractiveness, and the 
latter decays in time. 

The model works as follows. At each time step i, a new 
vertex joins the network (i.e., a new paper is published). 
The new vertex/paper has m references to existing ver- 
tices/papers. The probability n(i — j,t) that the new 
vertex i points to a target vertex j with indegree 



reads 

n{i~^j,t)^[kl + A,{t)], (8) 

where Aj{t) is the attractiveness of j at time t. If Aj{t) 
were constant and equal for all vertices we would recover 
the standard linear preferential attachment [Sll, HI] . We 
instead assume that it decays exponentially in time 

= Aoexp[-(i-to)/r]. (9) 

In Eq.[9]Ao is the initial attractiveness of the vertex, and 
to is the time in which the vertex first appears in the 
network; r is the time scale of the decay, after which the 
attractiveness lowers considerably and loses importance 
for citation dynamics. Since citation bursts occur in the 
initial phase of a paper's life (Fig. 2b), when vertex at- 
tractiveness is most relevant, we expect that the values of 
the initial attractiveness are heterogencously distributed, 
to account for the broad distribution of burst sizes (Fig. 
2a). We assume the power law distribution 

P(Ao)^Ao". (10) 

We performed numerical simulations of the model with 
parameters obtained from the empirical data. We use 
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FIG. 4: Comparison of the citation distributions from the empirical data and our model. For all cases, we used a — 2.5 and 
r = lyear. (a) For 2008, A/'=4415905, < k >= 9.0 (b) For 1990, Af=180708, < >= 6.5 (c) For 1970, Af=62382, < k >=5.6 (d) 
For 1950, A^=1950, < k >=3.1. Here A'^ is the number of papers/vertices and < > is the average number of citations/indegree. 



a = 2.5, T = 1 year and A„iin < < 0.0027V(t) with 
N{t) is the number of papers at time t. The upper bound 
represents the largest average indegree of our citation 
networks, expressed in terms of the number of vertices. 
The value of A„iin depends on the obtained value of the 
attractiveness from empirical data. We set Amin = 25.0 
for most years, for 1950 we set Amin = 14.5, because (A) 
is smaller than 7.1. The result is however not very sen- 
sitive to the minimum and maximum value of Aq. Figs. 
4a, 4b, 4d and 4d show the citation distributions of em- 
pirical data versus the model prediction. The model can 
reproduce the empirical distributions very well at differ- 
ent phases in the evolution of the APS citation network, 
from the remote 1950 (panel d) until the very recent 2008 
(panel a). 



The distributions of citation burst magnitude Ak/k for 
the data and the model are shown in Fig. 5a. For a better 
comparison between data and model we "evolve" the net- 
work according to the model by starting from the struc- 
ture of the empirical citation network at the beginning of 
the time window for the detection of the bursts. We stop 
the evolution after the observation time 6t elapses. In 
Fig. 5a we consider 1989 and 2007, with a time window 
of 1 year for the burst detection. The model successfully 
reproduces the empirical distributions of burst size. In 
Fig. 5b we consider much longer observation periods for 
the bursts, of 5 and 10 years. Still, the model gives an 
accurate description of the tail of the empirical curve in 
both cases. 



III. DISCUSSION 



We investigated citation dynamics for networks of pa- 
pers published on journals of the American Physical Soci- 
ety. Kolmogorov-Smirnov statistics along with goodness 
of fit tests make us conclude that the best ansatz for the 
distribution of citations (from old times up to any given 
year) is a shifted power law. The latter beats both simple 
power laws, which are acceptable only on the right tails 
of the distributions, and log-normals, which are better 
than simple power laws on the left part of the curve, but 
are not accurate in the description of the right tails. We 
have also studied dynamic properties of citation flows, 
and found that the early life of papers is characterized 
by citation bursts, like already found for popularity dy- 
namics in Wikipedia and the Web. 

The existence of bursts is not compatible with tradi- 
tional models based on preferential attachment, which 
are capable to account for the skewed citation distri- 
butions observed, but in which citation accumulation is 
smooth. Therefore we have introduced a variant of linear 
preferential attachment, with two new features: 1) the 
attractiveness decays exponentially in time, so it plays 
a role only in the early life of papers, after which it is 
dominated by the number of citations accumulated; 2) 
the attractiveness is not the same for all vertices but 
it follows a heterogeneous (power-law) distribution. We 
have found that this simple model is accurate in the de- 
scription of the distributions of citations and burst sizes, 
across very different scientific ages. Moreover, the model 
is fairly robust with respect to the choice of the observa- 
tion window for the bursts. 
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FIG. 5: Comparison of the distributions of citation burst size from the empirical data and the model. The exponent a of 
the distribution of initial attractiveness is 2.5, as in Fig. 4. (a) The reference years are 1989 (squares) and 2007 (circles), 
the observation window for the bursts is 5i = 1 year in both cases, (b) Here the reference years are 1998 (squares) and 2003 
(circles) and the observation windows for the bursts are of 10 and 5 years, respectively. 



IV. MATERIALS AND METHODS 

Our citation database includes all papers published in 
journals of the American Physical Society (APS) from 
1893 to 2008, except papers published in Revie"ws of 
Modern Physics. There are 3 992 736 citations among 
414977 papers at the end of 2008. The journals we 
considered are Physical Revie"w (PR), Physical Revie"w 
Letters (PRL), Physical Revie"w A (PRA), Physical Re- 
view B (PRE), Physical Review C (PRC), Physical Re- 
view D (PRD), Physical Review E (PRE), Physical Re- 
view - Series I (PRI), Physical Review Special Top- 



ics - Accelerators and Beams (PRSTAB), and Physi- 
cal Review Special Topics - Physics Education Research 
(PRSTPER). From these data, we constructed time- 
aggregated citation networks from 1950 to a year x, with 
x = 1951, 1952,...., 2007, 2008. 
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